NSF PAR Search | NSF Public Access Repository

In many applications of reinforcement learning, the underlying system dynamics are known, but computing the optimal policy is still difficult because the size of the state space can be enormously large. For example, Shannon famously estimated the number of states in chess to be approximately 10^120. To handle such enormous state space sizes, policy iteration algorithms make two major approximations; it is assumed that the value function lies in a lower-dimensional space and that only a few steps of a trajectory (called rollout) are used to evaluate a policy. Using a counterexample, we show that approximations can lead to the divergence of policy iteration. We show that sufficient lookahead in the policy improvement step mitigates this divergence and leads to algorithms with bounded errors. We also show that these errors can be controlled by appropriately choosing an appropriate amount of lookahead and rollout.

On non-unique solutions in mean field games

https://doi.org/10.1109/CDC40024.2019.9029906

Hajek, Bruce; Livesay, Michael (December 2019, 2019 58th Conference on Decision and Control)

The theory of mean field games is a tool to understand noncooperative dynamic stochastic games with a large number of players. Much of the theory has evolved under conditions ensuring uniqueness of the mean field game Nash equilibrium. However, in some situations, typically involving symmetry breaking, non-uniqueness of solutions is an essential feature. To investigate the nature of non-unique solutions, this paper focuses on the technically simple setting where players have one of two states, with continuous time dynamics, and the game is symmetric in the players, and players are restricted to using Markov strategies. All the mean field game Nash equilibria are identified for a symmetric follow the crowd game. Such equilibria correspond to symmetric $$\epsilon$$-Nash Markov equilibria for $$N$$ players with $$\epsilon$$ converging to zero as $$N$$ goes to infinity. In contrast to the mean field game, there is a unique Nash equilibrium for finite $N.$ It is shown that fluid limits arising from the Nash equilibria for finite $$N$$ as $$N$$ goes to infinity are mean field game Nash equilibria, and evidence is given supporting the conjecture that such limits, among all mean field game Nash equilibria, are the ones that are stable fixed points of the mean field best response mapping.

Full Text Available

Search for: All records